智能论文笔记

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Robust Implementation of Foreground Extraction and Vessel Segmentation for X-ray Coronary Angiography Image Sequence

Zeyu Fu , Zhuang Fu , Chenzhuo Lv , Jun Yan

分类：计算机视觉

2022-09-15

从X射线冠状动脉造影（XCA）图像序列中提取对比度的血管对于直觉诊断和治疗具有重要的临床意义。在这项研究中，XCA图像序列O被认为是三维张量输入，血管层H是稀疏张量，而背景层B是低级别张量。使用张量核标准（TNN）最小化，提出了一种基于张量的强稳定主成分分析（TRPCA）的新型血管层提取方法。此外，考虑了血管的不规则运动和周围无关组织的动态干扰，引入了总变化（TV）正规化时空约束，以分离动态背景E。 - 阶段区域生长（TSRG）方法用于血管增强和分割。全局阈值分割用作获得主分支的预处理，并使用ra样特征（RLF）滤波器来增强和连接破碎的小段，最终的容器掩模是通过结合两个中间结果来构建的。我们评估了TV-TRPCA算法的前景提取的可见性以及TSRG算法在真实临床XCA图像序列和第三方数据库上的血管分割的准确性。定性和定量结果都验证了所提出的方法比现有的最新方法的优越性。

translated by 谷歌翻译

SIND: A Drone Dataset at Signalized Intersection in China

Yanchao Xu , Wenbo Shao , Jun Li , Kai Yang , Weida Wang , Hua Huang , Chen Lv , Hong Wang

分类：计算机视觉

2022-09-06

交叉路口是自动驾驶任务最具挑战性的场景之一。由于复杂性和随机性，在相交处的基本应用（例如行为建模，运动预测，安全验证等）在很大程度上取决于数据驱动的技术。因此，交叉点中对流量参与者（TPS）的轨迹数据集的需求很大。目前，城市地区的大多数交叉路口都配备了交通信号灯。但是，尚无用于信号交叉点的大规模，高质量，公开可用的轨迹数据集。因此，在本文中，在中国天津选择了典型的两相信号交叉点。此外，管道旨在构建信号交叉数据集（SIND），其中包含7个小时的记录，其中包括13,000多种TPS，具有7种类型。然后，记录了信德的交通违规行为。此外，也将信德与其他类似作品进行比较。 SIND的特征可以概括如下：1）信德提供了更全面的信息，包括交通信号灯状态，运动参数，高清（HD）地图等。2）TPS的类别是多种多样和特征的，其中比例是脆弱的道路使用者（VRU）最高为62.6％3）显示了多次交通信号灯违反非电动车辆的行为。我们认为，Sind将是对现有数据集的有效补充，可以促进有关自动驾驶的相关研究。该数据集可通过以下方式在线获得：https：//github.com/sotif-avlab/sind

translated by 谷歌翻译

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

Jun Xue , Cunhang Fan , Zhao Lv , Jianhua Tao , Jiangyan Yi , Chengshi Zheng , Zhengqi Wen , Minmin Yuan , Shegang Shao

分类：机器学习

2022-08-02

最近，先驱研究工作提出了大量的声学特征（原木功率谱图，线性频率卷轴系数，恒定的q cepstral系数等），以进行音频深层检测，获得良好的性能，并表明不同的子带对音频有不同的贡献DeepFake检测。但是，这缺乏对子带中特定信息的解释，这些功能也丢失了诸如阶段之类的信息。受合成语音机制的启发，基本频率（F0）信息用于提高综合语音的质量，而合成语音的F0仍然太平均，这与真实语音的F0差异很大。可以预期，F0可以用作重要信息来区分真正的语言和虚假语音，而由于F0的分布不规则，因此不能直接使用此信息。相反，选择了大多数F0的频带作为输入特征。同时，为了充分利用相位和全频段信息，我们还建议使用真实和虚构的频谱图作为互补输入功能，并分别对Discoint子带进行建模。最后，融合了F0的结果，真实和假想的频谱图。 ASVSPOOF 2019 LA数据集的实验结果表明，我们所提出的系统对于音频DeepFake检测任务非常有效，达到等效错误率（EER）为0.43％，几乎超过了所有系统。

translated by 谷歌翻译

Edge-Enhanced Dual Discriminator Generative Adversarial Network for Fast MRI with Parallel Imaging Using Multi-view Information

Jiahao Huang , Weiping Ding , Jun Lv , Jingwen Yang , Hao Dong , Javier Del Ser , Jun Xia , Tiaojuan Ren , Stephen Wong , Guang Yang

分类：人工智能 | 计算机视觉 | 机器学习

2021-12-10

在临床医学中，磁共振成像（MRI）是诊断，分类，预后和治疗计划中最重要的工具之一。然而，MRI遭受了固有的慢数据采集过程，因为数据在k空间中顺序收集。近年来，大多数MRI重建方法在文献中侧重于整体图像重建而不是增强边缘信息。这项工作通过详细说明了对边缘信息的提高来阐述了这一趋势。具体地，我们通过结合多视图信息介绍一种用于快速多通道MRI重建的新型并行成像耦合双鉴别器生成的对抗网络（PIDD-GaN）。双判别设计旨在改善MRI重建中的边缘信息。一个鉴别器用于整体图像重建，而另一个鉴别器是负责增强边缘信息的负责。为发电机提出了一种具有本地和全局剩余学习的改进的U-Net。频率通道注意块（FCA块）嵌入在发电机中以结合注意力机制。引入内容损耗以培训发电机以获得更好的重建质量。我们对Calgary-Campinas公共大脑MR DataSet进行了全面的实验，并将我们的方法与最先进的MRI重建方法进行了比较。在MICCAI13数据集上进行了对剩余学习的消融研究，以验证所提出的模块。结果表明，我们的PIDD-GaN提供高质量的重建MR图像，具有良好的边缘信息。单图像重建的时间低于5ms，符合加快处理的需求。

translated by 谷歌翻译

SAGCI-System: Towards Sample-Efficient, Generalizable, Compositional, and Incremental Robot Learning

Jun Lv , Qiaojun Yu , Lin Shao , Wenhai Liu , Wenqiang Xu , Cewu Lu

分类：机器人 | 人工智能 | 计算机视觉 | 机器学习

2021-11-29

建设通用机器人在人类水平的各种环境中对大量的任务进行众所周知的复杂。它需要机器人学习是采样的，更概括的，可概括的，组成和增量。在这项工作中，我们介绍了一个称为SAGCI-System的系统学习框架，实现了超过四种要求。我们的系统首先采用由安装在机器人手腕上的摄像机收集的原始点云作为输入，并产生所代表为URDF的周围环境的初始建模。我们的系统采用了一个加载URDF的学习增强的可分辨率模拟。然后，机器人利用交互式感知来与环境交互，并修改URDF。利用模拟，我们提出了一种新的基于模型的RL算法，这些RL算法结合了以上的对象和机器人为中心的方法，以有效地产生完成操纵任务的策略。我们应用我们的系统，以进行仿真和现实世界的铰接物体操纵。广泛的实验表明了我们提出的学习框架的有效性。 https://sites.google.com/view/egci提供了补充材料和视频。

translated by 谷歌翻译

StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles

Yifeng Ma , Suzhen Wang , Zhipeng Hu , Changjie Fan , Tangjie Lv , Yu Ding , Zhidong Deng , Xin Yu

分类：计算机视觉

2023-01-03

Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.

translated by 谷歌翻译

Further Improving Weakly-supervised Object Localization via Causal Knowledge Distillation

Feifei Shao , Yawei Luo , Shengjian Wu , Qiyi Li , Fei Gao , Yi Yang , Jun Xiao

分类：计算机视觉

2023-01-03

Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.

translated by 谷歌翻译

Surveillance Face Anti-spoofing

Hao Fang , Ajian Liu , Jun Wan , Sergio Escalera , Chenxu Zhao , Xu Zhang , Stan Z. Li , Zhen Lei

分类：计算机视觉

2023-01-03

Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.

translated by 谷歌翻译

Edge Enhanced Image Style Transfer via Transformers

Chiyu Zhang , Jun Yang , Zaiyan Dai , Peng Cao

分类：计算机视觉

2023-01-02

In recent years, arbitrary image style transfer has attracted more and more attention. Given a pair of content and style images, a stylized one is hoped that retains the content from the former while catching style patterns from the latter. However, it is difficult to simultaneously keep well the trade-off between the content details and the style features. To stylize the image with sufficient style patterns, the content details may be damaged and sometimes the objects of images can not be distinguished clearly. For this reason, we present a new transformer-based method named STT for image style transfer and an edge loss which can enhance the content details apparently to avoid generating blurred results for excessive rendering on style features. Qualitative and quantitative experiments demonstrate that STT achieves comparable performance to state-of-the-art image style transfer methods while alleviating the content leak problem.

translated by 谷歌翻译